F3. Continuous random variables
(cont F2) Variance
Let \(\mu\) be the expected value for the random variable \(X\)
The variance describes the spread around the expected value. More specifically is the variance the expected value of the quadratic distance to the expected value of \(X\).
\(V(X) = E((X-E(X))^2)\)
or
\(V(X) = E((X-\mu)^2)\)
The reason for squaring the distances is that there will be both negative and positive distances, and they can ‘cancel each other out’ if summed directly.
The deviation is \(\sqrt{V(X)}\) and is a measure of spread on the same scale as the random variable \(X\).
(cont F2) Variance for a discrete random variable
\(X\) is a discrete random variable
\(V(X)=\sum_{\text{all x}} (x-\mu)^2P(X=x)\)
Let \(X= \text{"number of dots"}\)
We have earlier derived that \(E(X)=3.5\)
\[\begin{split} & V(X)=\sum_{x=1}^6(x-3.5)^2\cdot \frac{1}{6} = \\ & \frac{1}{6} ((1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2) = \\ & \frac{1}{6} ((-2.5)^2 + (-1.5)^2+ (-0.5)^2+ (0.5)^2+ (1.5)^2+ (2.5)^2) = \frac{17.5}{6} \end{split}\]
On a specific road the random variable \(X = \text{number of accidents during a week}\) has the following distribution
| Outcome (x) | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| \(P(X = x)\) | 0.70 | 0.20 | 0.06 | 0.04 |
Calculate the expected number of accidents during a week
\[\begin{split} & E(X) = \sum_{x=0}^3 x\cdot P(X=x) = \\ & 0\cdot 0.70 + 1 \cdot 0.20 + 2\cdot 0.06+ 3\cdot 0.04 = 0.44 \end{split}\]
\(\therefore E(X)=\mu = 0.4\)
The symbolen \(\therefore\) means therefore is my conclusion that
What is the variance? (this was not on the exam)
\[\begin{split} & V(X)=\sum_{x=0}^3 (x-\mu)^2\cdot P(X=x) =\sum_{x=0}^3 (x-0.44)^2\cdot P(X=x)=\\ & (0-0.44)^2\cdot 0.70 + (1-0.44)^2 \cdot 0.20 + (2-0.44)^2\cdot 0.06+ (3-0.44)^2\cdot 0.04 =\\ & 0.6064 \end{split}\]
(cont F2) Probability and distributionfunctions discrete r.v.
Pair the following \(f(x)\) with \(F(X)\)
\(X=\text{"number of spams per hour"}\)
Model: \(X \sim Po(\lambda_X)\) where \(\lambda_X=0.5\)
What is the probability of receiving at least 6 spams in one day?
The first thing we need to do is convert the model’s intensity parameter so that it gives the number over the correct time unit. From per hour to per day.
1 spam per day corresponds to 24 spams per hour.
\(Y=\text{"number of spams per day (24 hours)"}\)
Model: \(Y \sim Po(\lambda_Y)\) where \(\lambda_Y=24\cdot 0.5 = 12\)
\[\begin{split} & P(Y\geq 6) = P(Y\geq 6) = \\ & 1-P(Y\leq 5) = 1 - F_Y(5) = \\ & 1- 0.0203 = 0.9797 \end{split}\]
Alternatively one can calculate the probability directly from the probability functions
\[\begin{split} F_Y(5) = & P(Y=0)+P(Y=1)+ \dots P(Y=5) =\\ & \frac{2^0e^{-2}}{0!} + \frac{2^1e^{-2}}{1!} +\dots +\frac{2^5e^{-2}}{5!} \end{split}\]
(cont F2) Expected value for a Poisson distribution
Difficult! Not included in the material for the course, but useful to have heard about.
\(X \sim Po(\lambda)\) and \(f(x) = \frac{\lambda^xe^{-\lambda}}{x!}\)
\[\begin{split} & E(X) = \sum_{x=0}^{\infty}x\cdot f(x) = \sum_{x=1}^{\infty}x\cdot f(x) = \\ & \sum_{x=1}^{\infty}x\cdot \frac{\lambda^xe^{-\lambda}}{x!} = \sum_{x=1}^{\infty}\frac{\lambda^xe^{-\lambda}}{(x-1)!} = \\& e^{-\lambda} \sum_{x=1}^{\infty}\frac{\lambda^x}{(x-1)!} = \\ & \lambda \cdot e^{-\lambda} \sum_{x=1}^{\infty}\frac{\lambda^{x-1}}{(x-1)!} = \\ & \lambda \cdot e^{-\lambda} \sum_{x=0}^{\infty}\frac{\lambda^x}{x!} = \lambda\cdot e^{-\lambda}\cdot e^{\lambda} = \lambda \end{split}\]
In the second but last step we used the mathematical result that \(\sum_{x=0}^{\infty}\frac{\lambda^x}{x!} = e^{\lambda}\)
Continuous random variables
- A continuous random variable \(X\) takes an infinite amount of values. This means that
\[P(X =x) = 0\]
- Instead, we study probability for intervals, e.g. the interval \([a,b]\):
\[P(a \leq X \leq b)\]
- The distribution of a continuous r.v. \(X\) can be described by a density function (probability density function, PDF)
\[f_X(x) \geq 0\]
Density function for a continuous r.v.
\[f(x) = \left\{ \begin{array}{lr} \frac{1}{b-a} & a \leq x \leq b\\ 0 & \text{otherwise} \end{array}\right.\]
A uniform distribution is suitable for a r.v. taking values in an interval with equal probability.
\[f(x) = \left\{ \begin{array}{lr} \lambda\cdot e^{-\lambda x} & x \geq 0\\ 0 & \text{otherwise} \end{array}\right.\]
The exponential distribution takes non-negative values \(x \geq 0\).
It is a suitable distribution for describing the time it takes for an event to occur, such as waiting time for a bus or getting an appointment with the doctor.
\(f(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\) där \(-\infty < x < \infty\)
a normal distribution has two parameters, \(\mu\) and \(\sigma^2\). The coincide with the expeted value and variance of the distribution
Distribution function for a continuous r.v.
Probability corresponds to an area under the density function
The distribution function is the area up to the outcome \(x\)
\[F(x)=\int_{-\infty}^{x} f(v)dv\]
- The total area under the density function is always 1
\[\int_{-\infty}^{\infty} f(x)dx\]
- \(P(X < x) = P(X\leq x)\) for continuous r.v. (not for discrete r.v.)
The random variable \(X\) is uniformly distributed in the interval 0 to 10.
We know that the density function is \[f(x) = \left\{ \begin{array}{lr} \frac{1}{10} & 0 \leq x \leq 10\\ 0 & \text{otherwise} \end{array}\right.\]
What is the probability that \(X\) is less than or equal to 7?
\[\begin{split} & P(X \leq 7) = F(7) = \int_{-\infty}^7 f(x)dx = \\ & \int_0^7 \frac{1}{10}dx = [\frac{x}{10}]_{x=0}^{7} = \\ & \frac{7}{10} - \frac{0}{10} = \frac{7}{10} \end{split}\]
The random variable \(X\) is exponentially distributed with the parameter \(\lambda = \frac{3}{2}\)
We know that \[f(x) = \left\{ \begin{array}{lr} \lambda\cdot e^{-\lambda x} & x \geq 0\\ 0 & \text{otherwise} \end{array}\right.\]
What is the probability that \(X\) is less or equal to 2?
\[\begin{split} & P(X \leq 2) = F(2) = \int_{-\infty}^2 f(x)dx = \\ & \int_{-\infty}^2 \lambda\cdot e^{-\lambda x}dx =\int_{0}^2 \frac{3}{2}\cdot e^{-\frac{3}{2} x}dx = \\ & [-e^{-\frac{3}{2} x}]_{x=0}^{2} = -e^{-\frac{3}{2}\cdot 2} - -e^{-\frac{3}{2} \cdot 0} = \\ & -e^{-3} + 1 = 1 - e^{-3} \end{split}\]
Distribution function for an exponential distribution
The distribution function for an exponential distribution is
\[F(x) = 1 - e^{-\lambda x}\]
Complementary event for a continuous r.v.
\(P(X \geq x) = 1 - P(X < x) \underbrace{ =}_{P(X=x)=0} 1 - P(X \leq x)\)
Probability over an interval
\(P(a < X \leq b) = P(X \leq b) - P(X \leq a)\)
\(P(-2 < X \leq 1)\)
Expeted value and variance for a continuous r.v.
\(X\) is a continuous random variable
\(\mu= E(X) = \int_{-\infty}^{\infty} xf(x)dx\)
\(\sigma^2 = V(X) = \int_{-\infty}^{\infty} (x-\mu)^2f(x)dx\)
\(X \sim Exp(\lambda)\)
\(E(X) = \frac{1}{\lambda}\)
\(V(X) = \frac{1}{\lambda^2}\)
Discrete and continuous r.v.
Normal distribution
The normal distribution is useful and often appear when describing natural phenomena
The normal distribution is a good description of random variation for sums of independent and equally distributed random variables
We will spend a lot of time on the normal distribution in this course
There is a trick to get the value on the distribution function for any parameter values
Density function for a normal distribution
\(X \sim N(\mu,\sigma)\)
Some text books and software use variance in the formula for the normal distribution \(N(\mu,\sigma^2)\)
The density function for a normal distribution looks like a church bell
The normal distribution is symmetrical
\(F(x) = 1 - F(-x)\)
- Mode, median and expeted value coincide fora normal distribution
Distribution function for a normal distribution
\(X \sim N(\mu,\sigma)\)
\[\begin{split} P(X \leq 0.1) & = F(0.1) = \int_{-\infty}^{0.1}f(x)dx = \\ & \int_{-\infty}^{0.1}\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}dx \end{split}\]
Let us assume that \(\mu=0\) and \(\sigma=1\)
\[=\int_{-\infty}^{0.1}\frac{1}{\sqrt{2\pi}}e^{-x^2}dx = \dots\text{is possible to solve but difficult}\]
Distribution function for a normal distribution - table
Instead of calcluating the integral we can use
- tables
- calculators/computer programs
If we only have a table - how to do for all possible values on the expted value \(\mu\) and the variance \(\sigma^2\)?
The solution is to standardise the distribution
Standardised Normal distribution
\(X \sim N(3,4)\)
Create a new r.v. \(Z = \frac{X-3}{4}\)
One can show that \(Z \sim N(0,1)\) which is a standardised normal distribution.
The following holds \(X = 3 + 4\cdot Z\)
The distribution function for a standardised normal distribution is denoted \(\Phi(x)\) and has a table
\[\begin{split} & P(X \leq 4) = P(\frac{X-3}{4} \leq \frac{4-3}{4}) = \\ & P(Z \leq 0.25) = \Phi(0.25) \underbrace{= 0.5987}_{\text{from table}} \end{split}\]
Standardised normal distribution and normal distribution
Let \(Z \sim N(0,1)\)
Then \(X = \mu + \sigma \cdot Z\) is also normally distributed with expected value \(\mu\) and variance \(\sigma^2\), i.e.
\[X \sim N(\mu,\sigma)\]
Let \(X \sim N(5,2)\)
\[\begin{split} & P(X \geq 0) = 1 - P(X < 0) = 1 - P(X \leq 0) = \\ & 1 - P(\frac{X-5}{2} \leq \frac{0-5}{2}) = 1 - \Phi(\frac{-5}{2}) = \\ & 1 - (1-\Phi(\frac{5}{2})) \end{split}\]
The weight of a skier with equipment is modelled by a normal distribution with expected value 80 kg and varians 36 kg^2. The skiier Kim is alone in the lift. What is the probability that his weight exceeds 90 kg?
Let \(X = \text{"weight in kg"}\)
Model: \(X \sim N(80,6)\)
\[\begin{split} & P(X > 90) = 1 - P(X \leq 90) = \\ & 1 - P(\frac{X-80}{6} \leq \frac{90-80}{6}) = 1 - \Phi(\frac{10}{6}) \end{split}\]
Quantile
A quantile divides a probability distribution into two parts.
\[P(X \leq x_{.98}) = 0.98\]
or
\[P(X > \lambda_{.02}) = 0.02\]
Examples of quantiles
Median – the quantile that divides the distribution into two halfs with 50% probability each
Quartiles – the quantiles that split a distribution into four parts with equal probability:
- First quartile (Q1)
- Second quartile = Median
- Third quartile (Q3)
Percentile – the p:th percentile is the value of a random variable that is higher than p% of all possible values
Quantiles illustrated with a distribution function
Quantiles illustrated with a density function
Quantiles illustrated with a boxplot
Quantiles of the normal distribution
We will use quantiles from a standardised normal distribution to create statistical tests and confidence intervals
The table sheet contains some commonly used quantiles
Extra examples
Let \(X \sim N(5,2)\)
- \(P(X \leq 6) = P(\frac{X-5}{2} \leq \frac{6-5}{2}) = \Phi(\frac{1}{2})\)
\[\begin{split} & P(1.8 < X < 7.2) = P(X < 7.2) - P(X \leq 1.8) = \\ & \Phi(\frac{7.2-5}{2})-\Phi(\frac{1.8-5}{2}) = \Phi(1.1) - \Phi(-1.6) = \\ & \Phi(1.1) - (1 - \Phi(1.6)) = 0.864 - (1 - 0.945) = 0.810 \end{split}\]
- Find \(a\) such that \(P(X \leq a) = 0.05\)
Let \(Z\) be the standarised normal distribution \(Z \sim N(0,1)\)
We know the following: \(P(X \leq a) = P(Z \leq \frac{a-5}{2})\)
If we can find the quantile for \(Z\), then we can derive the quantile for \(X\)
From the quantile table we see that \(P(Z \leq z_{.05}) = 0.05\) when \(z_{.05} = -1.645\)
use that \(\lambda_{1-\alpha} = -\lambda_{\alpha}\)
Then \(x_{.05} = 5 + 2 \cdot z_{.05} = 5 + 2 \cdot (-1.645) = 1.71\)